In this lab, you will study convolution and review how the different operations change the relationship between input and output.
Import the following libraries:
In [ ]:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
from scipy import ndimage, misc
Convolution is a linear operation similar to a linear equation, dot product, or matrix multiplication. Convolution has several advantages for analyzing images. As discussed in the video, convolution preserves the relationship between elements, and it requires fewer parameters than other methods.
You can see the relationship between the different methods that you learned:
$$linear \ equation :y=wx+b$$$$linear\ equation\ with\ multiple \ variables \ where \ \mathbf{x} \ is \ a \ vector \ \mathbf{y}=\mathbf{wx}+b$$$$ \ matrix\ multiplication \ where \ \mathbf{X} \ in \ a \ matrix \ \mathbf{y}=\mathbf{wX}+\mathbf{b} $$$$\ convolution \ where \ \mathbf{X} \ and \ \mathbf{Y} \ is \ a \ tensor \ \mathbf{Y}=\mathbf{w}*\mathbf{X}+\mathbf{b}$$In convolution, the parameter w is called a kernel. You can perform convolution on images where you let the variable image denote the variable X and w denote the parameter.
Create a two-dimensional convolution object by using the constructor Conv2d, the parameter in_channels
and out_channels
will be used for this section, and the parameter kernel_size will be three.
In [ ]:
conv = nn.Conv2d(in_channels=1, out_channels=1,kernel_size=3)
conv
Because the parameters in nn.Conv2d
are randomly initialized and learned through training, give them some values.
In [ ]:
conv.state_dict()['weight'][0][0]=torch.tensor([[1.0,0,-1.0],[2.0,0,-2.0],[1.0,0.0,-1.0]])
conv.state_dict()['bias'][0]=0.0
conv.state_dict()
Create a dummy tensor to represent an image. The shape of the image is (1,1,5,5) where:
(number of inputs, number of channels, number of rows, number of columns )
Set the third column to 1:
In [ ]:
image=torch.zeros(1,1,5,5)
image[0,0,:,2]=1
image
Call the object conv
on the tensor image
as an input to perform the convolution and assign the result to the tensor z
.
In [ ]:
z=conv(image)
z
The following animation illustrates the process, the kernel performs at the element-level multiplication on every element in the image in the corresponding region. The values are then added together. The kernel is then shifted and the process is repeated.
The size of the output is an important parameter. In this lab, you will assume square images. For rectangular images, the same formula can be used in for each dimension independently.
Let M be the size of the input and K be the size of the kernel. The size of the output is given by the following formula:
Create a kernel of size 2:
In [ ]:
K=2
conv1 = nn.Conv2d(in_channels=1, out_channels=1,kernel_size=K)
conv1.state_dict()['weight'][0][0]=torch.tensor([[1.0,1.0],[1.0,1.0]])
conv1.state_dict()['bias'][0]=0.0
conv1.state_dict()
conv1
Create an image of size 2:
In [ ]:
M=4
image1=torch.ones(1,1,M,M)
The following equation provides the output:
The following animation illustrates the process: The first iteration of the kernel overlay of the images produces one output. As the kernel is of size K, there are M-K elements for the kernel to move in the horizontal direction. The same logic applies to the vertical direction.
Perform the convolution and verify the size is correct:
In [ ]:
z1=conv1(image1)
print("z1:",z1)
print("shape:",z1.shape[2:4])
The parameter stride changes the number of shifts the kernel moves per iteration. As a result, the output size also changes and is given by the following formula:
Create a convolution object with a stride of 2:
In [ ]:
conv3 = nn.Conv2d(in_channels=1, out_channels=1,kernel_size=2,stride=2)
conv3.state_dict()['weight'][0][0]=torch.tensor([[1.0,1.0],[1.0,1.0]])
conv3.state_dict()['bias'][0]=0.0
conv3.state_dict()
For an image with a size of 4, calculate the output size:
The following animation illustrates the process: The first iteration of the kernel overlay of the images produces one output. Because the kernel is of size K, there are M-K=2 elements. The stride is 2 because it will move 2 elements at a time. As a result, you divide M-K by the stride value 2:
Perform the convolution and verify the size is correct:
In [ ]:
z3=conv3(image1)
print("z3:",z3)
print("shape:",z3.shape[2:4])
As you apply successive convolutions, the image will shrink. You can apply zero padding to keep the image at a reasonable size, which also holds information at the borders.
In addition, you might not get integer values for the size of the kernel. Consider the following image:
In [ ]:
image1
Try performing convolutions with the kernel_size=2
and a stride=3
. Use these values:
In [ ]:
conv4 = nn.Conv2d(in_channels=1, out_channels=1,kernel_size=2,stride=3)
conv4.state_dict()['weight'][0][0]=torch.tensor([[1.0,1.0],[1.0,1.0]])
conv4.state_dict()['bias'][0]=0.0
conv4.state_dict()
z4=conv4(image1)
print("z4:",z4)
print("z4:",z4.shape[2:4])
You can add rows and columns of zeros around the image. This is called padding. In the constructor Conv2d
, you specify the number of rows or columns of zeros that you want to add with the parameter padding.
For a square image, you merely pad an extra column of zeros to the first column and the last column. Repeat the process for the rows. As a result, for a square image, the width and height is the original size plus 2 x the number of padding elements specified. You can then determine the size of the output after subsequent operations accordingly as shown in the following equation where you determine the size of an image after padding and then applying a convolutions kernel of size K.
Consider the following example:
In [ ]:
conv5 = nn.Conv2d(in_channels=1, out_channels=1,kernel_size=2,stride=3,padding=1)
conv5.state_dict()['weight'][0][0]=torch.tensor([[1.0,1.0],[1.0,1.0]])
conv5.state_dict()['bias'][0]=0.0
conv5.state_dict()
z5=conv5(image1)
print("z5:",z5)
print("z5:",z4.shape[2:4])
In [ ]:
The process is summarized in the following animation:
A kernel of zeros with a kernel size=3 is applied to the following image:
In [ ]:
Image=torch.randn((1,1,4,4))
Image
Question: Without using the function, determine what the outputs values are as each element:
Double-click here for the solution.
Question: Use the following convolution object to perform convolution on the tensor Image
:
In [ ]:
conv = nn.Conv2d(in_channels=1, out_channels=1,kernel_size=3)
conv.state_dict()['weight'][0][0]=torch.tensor([[0,0,0],[0,0,0],[0,0.0,0]])
conv.state_dict()['bias'][0]=0.0
Double-click here for the solution.
Question: You have an image of size 4. The parameters are as follows kernel_size=2,stride=2. What is the size of the output?
In [ ]:
Double-click here for the solution.
Joseph Santarcangelo has a PhD in Electrical Engineering. His research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition.
Other contributors: Michelle Carey, Mavis Zhou
Copyright © 2018 cognitiveclass.ai. This notebook and its source code are released under the terms of the MIT License.